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Abstract. The zebra mussel is a nonindigenous invader 
of North American lakes and rivers and one of the few 
freshwater bivalve molluscs having a byssus—a sclero- 
tized organ used by the mussel for opportunistic attach¬ 
ment to hard surfaces. We have sequenced a foot-specific 
cDNA whose composite protein sequence was deduced 
from a series of overlapping but occasionally nonidentical 
cDNA fragments. The overall deduced sequence matches 
tryptic peptides from a major byssa! precursor protein— 
Dreissena polymorpha foot protein 1 (Dpfpl). The calcu¬ 
lated mass of Dpfpl is 49 kDa; but this is known to be 
extensively hydroxylated and <9-glycosylated during mat¬ 
uration. Purified native Dpfpl analyzed using matrix-as¬ 
sisted laser-desorption ionization mass spectrometry with 
time-of-flight indicates that the protein occurs as at least 
two size variants with masses of 48.6 and 54.5 kDa. In 
all probability, the sequence variants reported in this study 
are related to the larger mass variant. Dpfpl has a block 
copolymer-like structure defined by two consensus motifs 
that are sharply segregated into domains. The N-terminal 
side of Dpfpl has 22 tandem repeats of a heptapeptide 
consensus (P-[V/E]-Y-P-[T/S/6]-[K/Q]-X); the C-termi- 
nal side has 16 repeats of a tridecapeptide motif (K-P- 
G-P-Y-D-Y-D-G-P-Y-D-K). Both consensus repeats are 
unique, with some limited homology to other proteins 
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functioning in tension: marine mussel adhesives, plant 
extensins, titin, and trematode eggshell precursors. 

Introduction 

The zebra mussel, Dreissena polymorpha (Pallas), is a 
freshwater bivalve indigenous to the river basins of the 
Black, Baltic, and Caspian seas. Recently, it was acciden¬ 
tally introduced into one of the Great Lakes, and in less 
than 10 years, its distribution has expanded into the lakes 
and rivers of at least a third of the North American conti¬ 
nent (Johnson and Padilla, 1996). The economic impact 
of this expansion has been profound and is due, in large 
part, to fouling (Roberts, 1990). Zebra mussels foul by 
attaching opportunistically and in large numbers to a wide 
variety of surfaces by means of a thread-like structure 
known as a byssus (Ackerman et ah, 1992). In this re¬ 
spect, they resemble marine mussels (Mytilidae), which 
have adopted a similar strategy. 

Zebra mussel byssal threads are fibrous extracellular 
structures composed largely of proteins, many of which 
contain the post-translationally modified amino acid 3,4- 
dihydroxyphenylalanine (Dopa) (Rzepecki and Waite, 
1993). Peptidyl Dopa is thus a convenient marker of byssal 
precursor proteins and is thought to play an important role 
in adhesion and the maturational cross-linking of byssal 
threads (Waite. 1990). Three polymorphic Dopa-con- 
taining protein families have previously been isolated and 
partially characterized from zebra mussel foot tissue, the 
site of byssal protein synthesis and storage. The largest 
of these proteins, Dreissena polymorpha foot protein 1 
(Dpfpl), has an apparent molecular weight of 76 kDa and 
Dopa at levels up to 6.6 mole % (Rzepecki and Waite, 
1993). Like many byssal precursors from marine mussels, 
Dpfpl features Dopa residues in repeating consensus mo¬ 
tifs. Despite this similarity, Dpfpl is markedly different 
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from the marine proteins in two respects. First, members 
of the Dpfpl family have acidic isoelectric points ranging 
from 5.3 to 6.5; marine byssal precursors, in contrast, are 
highly basic—many with pis exceeding the effective re¬ 
solving range of available ampholytes. Second, dreissenid 
byssal precursors, including Dpfpl, are glycosylated with 
A^-acetylgalactosamine O-linked to serine and threonine 
residues; there is, however, no evidence for glycosylation 
in byssal proteins from any marine taxa. It is not known 
whether these differences reflect two generally valid solu¬ 
tions to the problem of adhesion underwater or represent 
genuine differences in the requirements for adhesive bond 
formation in freshwater and marine systems. 

Our efforts to determine the complete primary sequence 
of Dpfpl by traditional peptide mapping have been 
thwarted by the repetitive structure and protease-resistance 
of large regions of the protein (Rzepecki and Waite, 1993). 
In this study, we report on the complete primary sequence 
of Dpfpl deduced using molecular techniques. cDNA se¬ 
quence data reveal that Dpfpl is a tandemly repetitive 
protein composed of two motifs: a novel heptapeptide se¬ 
quence and a tridecapeptide consensus sequence. Unusu¬ 
ally, these motifs are segregated to distinct regions of the 
protein, a fact which almost certainly has important conse¬ 
quences to the self-assembly of the zebra mussel byssus. 

Materials and Methods 

RNA extractions 

All tissues used in these experiments were excised, 
immediately frozen in liquid nitrogen, and ground in a 
mortar chilled to -80°C. Tissue was homogenized in a 
hand-held glass homogenizer (Kontes, Vineland. NJ), and 
total RNA was extracted according to the methods of 
Chomczynski and Sacchi (1987). 

Reverse transcriptase (RT)-polymerase chain reaction 
(PCR) and 5' rapid amplification of cDNA ends 
(RACE) 

mRNA was purified from total RNA using the Oligo- 
tex mRNA spin column kit (Qiagen, Chatsworth, CA). 
After purification, 1 pg mRNA was reverse transcribed 
using 20 pmoles of a primer specific to polyA tracts 
(polyT-LD AGAGAGATTTTTTTTTTTTTTTTTVN) with 
200 units of MM-LV reverse transcriptase (Superscript 
11, Gibco-BRL) for 2 h at 37°C in buffer supplied by the 
manufacturer. The reaction was quenched with 1 ml of 1 
X TE, pH 7.5. One percent (v/v) of the resulting first- 
strand cDNA was amplified with the polymerase chain 
reaction (PCR) using degenerate oligonucleotide primers 
based on the previously determined (Rzepecki and Waite, 
1993) amino acid sequence of the N-terminus of Dpfpl 
(Dpi .N( + ) GG1AC1TAYGAYTGGACNGA) and an 


internal peptide (Dpl.Af —) TTRTCRTAIGG1CCRT- 
CRTA). Each 50 -p\ reaction contained 0.25 mM of each 
dNTP, 100 pmoles of each primer, and 2.5 units of 
7^/2000 polymerase (Stratagene, La Jolla, CA), in a 
buffer containing 10 mA/Tris-Cl, 1.5 mM MgCL, 75 mA/ 
KCkand 15 mA/(NH^SOj. Samples were initially dena¬ 
tured at 95°C for 4 min 30 s followed by 30 cycles of 
amplification as follows: 95°C for 30 s, 50°C for 30 s, 
and 72°C for 2 min. A final extension for 5 min at 72°C 
was carried out to ensure addition of 3' A overhangs. The 
resulting amplification product was ligated into the pCRll 
vector (Invitrogen, San Diego, CA) according to manufac¬ 
turer’s instructions. The insert from the newly constructed 
plasmid, pDPl.NA, was sequenced on both strands using 
vector-specific and degenerate oligonucleotide primers. 

5' RACE was performed to obtain cDNA sequence 
data upstream of the region coding for the N-terminus 
(Frohman et ai , 1988) and to independently establish the 
cDNA sequence of the N-terminus. All reactions were 
performed using reagents contained in the 5' RACE Sys¬ 
tem V2.0 (Life Technologies, Bethesda, MD) according 
to manufacturer’s instructions. Briefly, 1 pg of D. poly - 
morpha foot tissue total RNA was reverse-transcribed 
using a gene-specific primer (DpLGSPl(—) TATTTT- 
GTAGGAGTGGG). The purified first-strand cDNA was 
tailed with dCTP, and PCR was performed using the sup¬ 
plied abridged anchor primer (GGCCACGCGTCGACT- 
AGTACGGG1IGGG11GGC11G) and DpLGSPl(-). Each 
50-/71 reaction contained 0.25 mM of each dNTP and 
20 pmoles of each primer in 1 X PCR buffer (Life Tech¬ 
nologies, Bethesda, MD) supplemented with 2 mA/ 
MgCL. Samples were denatured at 95°C for 4 min 30 s 
and equilibrated to 72°C. Two-and-one-half units of 
TaqlOOO polymerase were added and amplification for 
25 cycles was performed under the following conditions: 
95°C for 30 s, 42°C for 30 s, and 72°C for 30 s. A final 
5-min extension was performed at 72°C. A second round 
of PCR was performed using AAP and a nested gene- 
specific primer (Dpl.GSP2(—) TTGTTGTATAGTTCG- 
GAATTTTAG). The reaction volume and component 
concentrations were as outlined in the previous reaction. 
Samples were initially denatured at 95°C for 4 min 30 s 
followed by 30 cycles of amplification as follows: 95°C 
for 30 s, 42°C for 30 s, and 72°C for 60 s. A final exten¬ 
sion for 5 min at 72°C was carried out to ensure addition 
of 3' A overhangs. The resulting amplification products 
were cloned into the pGEM-T vector (Promega, Madison, 
WI) according to manufacturer’s instructions. The insert 
from the newly constructed plasmid, pDP1.5'UTA, was 
sequenced on both strands using gene-specific primers. 

Probe synthesis and cDNA library screening 

Two probes were created in this experiment to screen 
a D. polymorpha foot tissue cDNA library (Eddington, 
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1996). A tligoxigenin (DIG)-labeled antisense RNA probe 
(probe #1) was generated from Ddel-digested pDPLNA 
using T7 polymerase and the D1G-RNA labeling kit 
(Boehringer-Mannheim) according to manufacturer's in¬ 
structions. A DIG-Iabeled double-stranded DNA probe 
(probe #2) spanning the 5' untranslated region of Dpfpl 
and the first 172 nt coding for the mature protein was 
generated using the PCR DIG probe synthesis kit (Boeh- 
ringer-Mannheim) according to manufacturer's instruc¬ 
tions. pDP1.5'UTA was used as a template for this reac¬ 
tion, and a primer specific to the 5' untranslated region 
of Dpfpl (Dpt.5' UT< + ) ATACTTCAGAGCATCAA- 
CCAA) and Dpl.GSPl(-) were used as primers. Both 
probes were individually incorporated at a concentration 
of lOOng/ml into standard hybridization buffer -1- 50% 
formamide (5x SSC, 1% Blocking buffer (Boehringer- 
Mannheim), 0.1% (w/v) sarcosyl, 0.02% (w/v) SDS. 50% 
formamide (v/v)). Hybridizations were carried out at 60°C 
(probe #1) or 42°C (probe #2). Stringency washes for 
both probes were conducted with 0. IX SSC/0.2% (w/v) 
SDS at 68°C. 

One million plaques generated from a XZAP-Express 
cDNA library (Stratagene. La Jolla, CA) were doubly 
screened with probes #1 and #2. No plaques positive for 
probe #2 were detected, suggesting that a full-length clone 
of Dpfpl was not present in this library. Forty plaques 
positive for probe #1 were cored, eluted in SM buffer 
(tOOmA/ NaCl, 50 mA/ Tris-CI pH 7.5, 8 mA/ MgS0 4 , 
0.1% gelatin), and tested for insert size by PCR using 
vector-specific primers flanking the cDNA insert. After 
secondary screening, cDNA from the plaque bearing the 
largest insert was rescued as a phagemid using the ExAs- 
sist interference-resistant helper phage kit (Stratagene, La 
Jolla, CA) and sequenced using the nested deletion tech¬ 
nique (see below). 

Nested deletions 

Nested deletions were performed using the double- 
stranded nested deletion kit (Pharmacia Biotech, Piscata- 
way, NJ). In each case, 5 ^g of template was doubly 
digested with EcoRI and PstL and the restriction enzymes 
were heat inactivated. Digested clones were precipitated 
in ethanol and resuspended in a buffer containing 1.5 M 
potassium acetate, 37.5 m M Tris-acetate pH 7.6, 15 mA/ 
magnesium acetate, 750 pM /J-mercaptoethanol, and 
15 pg/ ml bovine serum albumin (BSA). A 2-pg sample 
of each digest was used for digestion with Exonuclease 
III. The reactions were carried out at 23°C and aliquots 
taken every 5 min. All clones yielding deletions larger 
than the size of the empty vector were ligated, trans¬ 
formed into XL 1-Blue MRF' cells (Stratagene, La Jolla, 
CA), purified, and sequenced using a vector-specific 
primer. 


RNA dot blots 

Ten micrograms of total RNA separately extracted 
from D. polymorpha foot, adductor mussel, mantle, and 
gill tissue were diluted in an equal volume of RNA dilu¬ 
tion buffer (water: 20X SSC: formaldehyde; 5:3:2) and 
spotted onto a positively charged nylon membrane (MSI, 
Westboro, MA). The membrane was hybridized to either 
probe #1 as described above or to an actin-specitie dou¬ 
ble-stranded DIG-labeled DNA probe (Patwary et oL, 

1996) . Hybridization with actin specific probe was per¬ 
formed at 37°C with a stringency wash using 0.5X SSC/ 
0.1% (w/v) SDS at 68°C. 

Northern hybridizations 

Three micrograms of foot tissue mRNA were subjected 
to formaldehyde/agarose gel electrophoresis according to 
Sambrook et al. (1989). RNA was transferred onto a posi¬ 
tively charged nylon membrane and hybridized overnight 
with probe #1. 

Mass analysis of native Dpfpl 

Native Dplpl was purified from the foot of adult zebra 
mussels according to Rzepeeki and Waite (1993). The 
mass of the native protein was determined by matrix- 
assisted laser desorption-ionization mass spectrometry 
with time-of-flight (MALDI-TOF) using a PerSeptive Bi¬ 
osystems Voyager model in the positive ion mode and 
delayed extraction. A 20- pM solution of Dpfpl in 0.1% 
acetic acid was mixed with three volumes of a saturated 
sinapinic acid solution (40% acetonitrile/0.1 % TFA); 2 p\ 
of the resulting mixture (lOpmoles Dpfpl) was placed 
on a sample plate and allowed to air dry. The sample was 
inserted into a vacuum chamber (1 X 10 torr) and the 
spectra generated from 256 pulses of a 337-nm laser were 
averaged. The acceleration voltage was 25,000 with a 
90% grid voltage and a guidewire setting of 0.1%. 

Results 

RNA dot blots and Northern hybridizations 

The tissue specificity of Dpfpl is demonstrated in Fig¬ 
ure 1. RNA dot blots show that Dpfpl mRNA transcripts 
were detected only in total RNA extracts from foot tissue 
and not in extracts from gill, adductor muscle, or mantle 
tissue. Identical dot blots hybridized to an actin-specific 
probe were positive for all tissue types although the 
strength of the signal varied considerably between tissue 
types (data not shown). These results are consistent with 
data obtained from other marine byssal precursor proteins 
(tnoue et al, 1995, 1996a; Coyne et al, 1997; Qin et al, 

1997) and support the hypothesis that Dpfpl plays a role 
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Figure 1 . Total RNA dot blols of Dreissena polymorpha foot (F), adductor muscle (A), mantle (M), 
and gill (G) tissue hybridized to a digoxigenin-labeled RNA probe specific to Dpfpl. The probe was 
hybridized to 1 pg of total RNA from each tissue. 


as a byssal structural protein. Nonhem blots of foot tissue 
mRNA indicated that Dpfpl transcripts range in size from 
1200 b to 1500 b, suggesting the presence of size variants 
(Fig. 2). 

Dpfpl cDNA sequence 

In Figure 3 the aligned nucleotide sequence data ob¬ 
tained from 5' RACE, RT-PCR with degenerate oligonu¬ 
cleotide primers, and from the largest cDNA clone iso¬ 
lated are presented. Each sequence differs slightly from 
the other, and therefore the consensus sequence generated 
from this alignment does not represent any single Dpfpl 
sequence. It is likely that differences in the data sets 
reflect the existence of Dpfpl variants rather than errors 
introduced during amplification, because each set of PCR 



◄ i.o 


Figure 2. Northern blot of Dreissena polymorpha foot tissue mRNA 
hybridized to a digoxigenin-labeled RNA probe specific to Dpfpt; 3 pg 
of fool tissue mRNA was used. 


sequence data was determined from at least two indepen¬ 
dently amplified samples. The combined transcript is 
1481 bp in length and contains an open reading frame of 
1332 bp coding for a protein of 443 amino acids. Included 
in the transcript is a start codon at nucleotide position 36 
and two overlapping canonical polyadenylation signals 
(Kozak, 1986) at nucleotide positions 1464 and 1468. 
The calculated molecular weight of the deduced primary 
sequence is 49 kDa, with a predicted isoelectric point 
of 5.29. 

The first 19 amino acids code for a putative signal 
peptide that conforms to the rule of von Heijne (1985). 
Computer-based modeling of signal peptide cleavage 
(Nielsen cl at., 1997) correctly predicts cleavage of the 
signal peptide preceding the previously determined N- 
terminal glycine residue of the mature protein (Rzepecki 
and Waite, 1993). The N-terminus of Dpfpl, as coded 
for by sequences generated using 5' RACE, differs from 
the previously reported N-terminal sequence (Rzepecki 
and Waite, 1993) in that it substitutes serine residues 
for threonine at position #2, tyrosine at position #3, and 
aspartic acid at position #10. None of the three indepen¬ 
dently generated 5' RACE clones exactly coded for the 
previously reported N-terminus of Dpfpl. N-terminal se¬ 
quence data generated with degenerate oligonucleotide 
primers more closely resemble the previously reported N- 
terminal sequence but also substitute serine for aspartic 
acid at position #10. It is not possible to determine from 
these data if the N-terminal sequence deduced from 
cDNAs generated via degenerate oligonucleotide primers 
reflects a genuinely different N-terminus or is simply an 
artifact forced by the primers used during amplification. 

The N-terminal 38 amino acids of the mature protein 
are relatively enriched in threonine and serine residues 
and quickly give way to a tandemly repeating heptapep- 
tide. This generally basic motif (P-[V/E]-Y-P-[T/S]-[K/ 
Q]-X) is repeated 22 times in the N-terminal half of Dpfpl 
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with some variation, particularly at position #7 of the 
consensus sequence; however, proline residues at posi¬ 
tions #1 and #4 and tyrosine residues at position #3 are 
highly conserved (Fig. 4). RT-FCR data differ from 
cDNA clone data in this region of the transcript by omis¬ 
sion of threonine 175 and by a GI90E substitution re¬ 
sulting from a transversion at nucleotide position 661. 

The C-temiinal half of Dpfpl is dominated by the pre¬ 
viously reported 13 amino acid consensus sequence: K-P- 
G-P-Y-D-Y-D-G-P-Y-D-K (Rzepecki and Waite, 1993). 
This acidic sequence is found tandemlv repeated 16 times 
with only slight variations from the consensus (Fig. 4). 
The deduced amino acid composition of the composite 
Dpfpl sequence, without signal peptide sequence agrees 
well with that of native Dpfpl (Table 1), suggesting that 
the composite sequence described above is representative 
of Dpfpl mRNAs present in zebra mussel foot tissue. 
Examination of codon usage for Dpfpl (Table 11) reveals 
a significant degree of codon bias in amino acids that 
occur in conserved positions of the above-mentioned con¬ 
sensus sequences ( e.g P, Y, D, K, T, G). 


Mass analysis of native Dpfpl 

MALD1-TOF analysis of native Dpfpl indicates that 
the purified protein is represented by two major mass 
variants. The lighter of the two variants has a mass [M 
T H J 1 = 48.6 kDa, whereas in the heavier variant, [M 

+ H p = 54.5 kDa. No peaks were detected in the 60- 
80 kDa range. 

Discussion 

The primary structure of Dpfpl, deduced from overlap¬ 
ping cDNAs, represents the first complete sequence for 
a dreissenid byssal protein and an important advance in 
understanding the attachment strategy of the zebra mus¬ 
sel. Two observations suggest that the composite se¬ 
quence generated from these data sets is likely to resemble 
full-length transcripts for Dpfpl. First, the size of the 
composite sequence (1481 bases) closely matches the size 
of the largest Dpfpl transcript as determined by Northern 
blots of zebra mussel foot tissue mRNA hybridized to a 
Dpfpl-specific probe. Second, the deduced amino acid 
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Figure 4. Aligned motifs of Dpfpl. The consensus sequences of Dpfpl are presented as derived from 
the landemly repeating motifs of the deduced primary sequence. The sequences composing each motif are 
presented contiguously, and amino acids in the consensus sequence occur in the majority of aligned motifs 
at their respective positions. Numbers at the beginning and end of each motif represent the position of this 
sequence within the deduced amino acid sequence for Dpfpl as presented in Figure 3. The motif in 
parentheses occurs in RT-PCR derived sequences in place of the preceding cDNA motif. Amino acids are 
represented by their single letter codes, and a dash (-) indicates a gap inserted for alignment purposes. 
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Table I 


Amino acid composition of deduced and native Dpfpl 


Amino acid 

Native 

Deduced 

Asx 

136.7 

134.8 

Thr 

75.0 

82.7 

Ser 

34.4 

33.1 

Glx 

70.1 

52.0 

Pro 

238.6 

234.0 

Gly 

76.5 

68.6 

Ala 

7.9 

2.4 

Val 

50.4 

52.0 

Mel 

0.7 

0.0 

lie 

9.9 

9.5 

Leu 

20.4 

18.9 

Dopa 

66.6 

N.D. 

Tyr 

84.5 

165.5 

Phe 

9.6 

14.2 

His 

5.1 

7.1 

Lys 

94.8 

99.3 

Arg 

17.0 

14.2 

Trp 

1.8 

11.8 

Total: 

1000.0 

1000.0 


The amino acid composition of deduced Dpfpl is determined exclud¬ 
ing signal peptide residues, and that of native Dpfpl is from Rzepecki 
and Waite (1993). All values are in residues per thousand residues. 


composition of the composite sequence, excluding the 
signal peptide, closely matches the composition of native 
Dpfpl as reported in Rzepecki and Waite (1993). 

Purified native Dpfpl was subjected lo MALD1-TOF 
analysis to resolve the conflict between the apparent and 


cDNA-deduced mass estimates. SDS-PAGE oi native 
Dpfpl established that the purified protein migrates as a 
doublet with apparent molecular masses of 65 and 76 kDa 
(Rzepecki and Waite, 1993). However, the deduced mass 
of Dpfpl of 49 kDa (this work), even allowing for an 
additional 6.5 kDa contributed by post-translational gly- 
cosylation and hydroxylation (Rzepecki and Waite, 
1993), is difficult to reconcile with the empirically deter¬ 
mined apparent masses. According to MALDI-TOF mass 
spectrometric analysis, Dpfpl exists primarily as a dou¬ 
blet (48.6 and 54.5 kDa) with no visible components 
above 60 kDa. The mass of the larger variant is in excel¬ 
lent agreement with the deduced mass of Dpfpl alter 
addilion of post-translational modifications. The smaller 
variant may represent unmodified Dpfpl or possibly a 
fully modified variant coded for by one of the smaller 
Dpfpl transcripts delected during Northern blot analysis 
of mRNA from zebra mussel foot tissue (Fig. 2). This 
observation confirms that Dpfpl, like many other byssal 
precursor proteins (see Coyne et aL , 1997; Qin et aL, 
1997; Taylor et aL , 1996; Papov et aL, 1995), migrates 
anomalously during SDS-PAGE. 

In previous studies, isoelectric focusing of purified 
Dpfpl suggested the presence of at least 10 eleciropho- 
retic variants in the polymorphic family (Rzepecki and 
Waite, 1993). These multiple bands may reflect differ¬ 
ences in the primary structure of Dptp 1 variants, nonuni¬ 
form post-translalional modification of one or more forms 
of the protein, or both. At least some of the variation 
musi arise from differences in primary structure since 
the N-terminus of Dpfpl exhibited heterogeneity at two 


Table II 


Codon usage in Dpfpl 


Amino acid Codon # Amino acid Codon # Amino acid Codon # Amino acid Codon 


F 

TTT 

4 

L 

CTT 

2 

F 

TTC 

5 

L 

CTC 

0 

L 

TTA 

2 

L 

CTA 

4 

L 

TTG 

3 

L 

CTG 

0 

S 

TCT 

6 

P 

CCT 

12 

S 

TCC 

3 

P 

CCC 

7 

S 

TCA 

6 

P 

CCA 

58 

S 

TCG 

1 

P 

CCG 

22 

Y 

TAT 

57 

H 

CAT 

1 

Y 

TAC 

12 

H 

CAC 

2 

* 

TAA 

1 

Q 

CAA 

8 

* 

TAG 

0 

Q 

CAG 

3 

C 

TGT 

1 

R 

CGT 

2 

C 

TGC 

0 

R 

CGC 

1 

* 

TGA 

0 

R 

CGA 

2 

w 

TGG 

5 

R 

CGG 

0 


1 

ATT 

2 

V 

GTT 

5 

I 

ATC 

0 

V 

GTC 

4 

1 

ATA 

-> 

V 

GTA 

12 

M 

ATG 

1 

V 

GTG 

2 

T 

ACT 

13 

A 

GCT 

0 

T 

ACC 

4 

A 

GCC 

0 

T 

ACA 

15 

A 

GCA 

1 

T 

ACG 

3 

A 

GCG 

2 

N 

A AT 

5 

D 

GAT 

39 

N 

AAC 

3 

D 

GAC 

9 

K 

AAA 

39 

E 

GAA 

9 

K 

AAG 

6 

E 

GAG 

-> 

S 

AGT 

0 

G 

GGT 

14 

S 

AGC 

2 

G 

GGC 

4 

R 

AGA 

1 

G 

GGA 

11 

R 

AGG 

0 

G 

GGC, 

3 


Amino acids are represented by their single letter codes, and an asterisk (*) indicates a stop codon. These data are compiled from the composite 
sequence of Dpfpl presented in Figure 3. Where discrepancies exist in the consensus sequence, the cDNA and 5' RACE data are used to determine 
codon usage at that position. 











158 


K. E. ANDERSON AND J. H. WAtTE 


positions (#2 and #8, Fig. 3) (Rzepecki and Waite, 1993). 
The nucleotide sequences presented in Figure 3 suggest 
the existence of at least two of these variants. Differences 
between these variants in regions of cDNA overlap are 
limited to the deletion of a single codon in the RT-PCR 
data and a single transversion resulting in an amino acid 
substitution in one of the heptapeptide sequences. 

An examination of the codon usage data (Table 11) 
indicates that compositionally dominant amino acids are 
predominantly coded for by half of the potentially avail¬ 
able codons for these residues. This is especially true 
of proline, tyrosine, aspartic acid, lysine, threonine, and 
glycine residues, which together account for almost 75% 
of the amino acid composition of Dpfpl. The pattern of 
codon bias in compositionally dominant residues has also 
been noted in other marine byssal precursor proteins — 
notably Mefpl (Filpula et al 1990), Mgfpl (Inoue and 
Odo, 1994), Mefpl (Inoue et al . 1996b), and, to a lesser 
extent, Mgfp2 (Inoue et al , 1995)—and may reflect a 
need to express byssal structural proteins rapidly in re¬ 
sponse to developmental cues and changing environmen¬ 
tal conditions. It is well established that in bacterial sys¬ 
tems, codon bias is positively correlated with the rates of 
gene expression (Robinson et al, 1984; Varenne et al, 
1984; Sorensen et al, 1989), presumably through selec¬ 
tion of codons that recognize the most abundant isoac- 
cepting tRNAs for a given amino acid. Precedence for 
this hypothesis can also be found among highly expressed 
genes in multicellular organisms such as Drosophila tael- 
anogaster , whose chorion genes, important eggshell com¬ 
ponents known to be highly expressed during egg devel¬ 
opment (Kafatos et al, 1987), also exhibit significant 
codon bias (Akashi, 1994). Such a hypothesis has also 
been advanced to explain observed codon bias in the 
highly expressed silk fibroin heavy chain of the silk moth, 
Bombyx mori (Mita et al, 1994). 

More than 80% of the deduced primary amino acid 
sequence of Dpfpl is composed of tandemly repeated 
and segregated motifs: one is a heptapeptide; the other, 
a tridecapeptide consensus motif that coincides with pep¬ 
tides sequenced previously (Rzepecki and Waite, 1993). 
The occurrence of two relatively short tandemly repeating 
motifs in Dpfpl is consistent with its proposed role as a 
byssal structural protein. However, the absence of data 
on the distribution of Dpfpl within the byssus makes it 
difficult to assign a specific role at this time. The repetitive 
nature of Dpfpl is shared by many of the structural pro¬ 
teins of marine byssi. Two of three characterized Dopa- 
containing byssal proteins in Mytilus are known to be 
composed almost entirely of tandem repeats. Mefpl, a 
110-kDa protein thought to play a role as a cuticular 
lacquer in the byssus of M. edulis, is dominated by non- 
segregated hexa- and decapeptide repeats (Filpula et al, 
1990; Waite et al, 1985; Laursen, 1992). Mgfp2, a 49- 


kDa plaque-specific protein of M. galloprovincialis, is 
largely composed of larger, epidermal growth factor-like 
repeats (Inoue et al. , 1995). 

The N-terminal half of Dpfpl is dominated by a hepta¬ 
peptide motif that is repeated 22 times with some variation, 
particularly at position #7 of the consensus sequence. Vari¬ 
ability notwithstanding, the spacing of proline and tyrosine 
residues is well conserved, suggesting that these amino 
acids play an important functional role in the motif. No 
tryptic peptides exactly matching the deduced primary se¬ 
quence could be mapped to this part of the protein; however, 
a fragment of one tryptic peptide (tryptic peptide #13 in 
fig. 6 of Rzepecki and Waite, 1993) containing the subse¬ 
quence S-P-L-Y-G-W ... is found to bridge two of the 
heptapeptide repeats. Although the tyrosine in this sequence 
is efficiently converted to Dopa, the amino acid composition 
of residual undigested Dpfpl suggests that, as a whole, 
this region contains relatively little Dopa (Rzepecki, pers. 
comm.). 

Given the frequency of lysine and arginine in the hepta¬ 
peptide repeat region, the resistance of the repeat to cleav¬ 
age by trypsin is intriguing. An examination of the 
deduced primary sequence indicates that K-P or R-P se¬ 
quences cannot be the basis for this resistance. Interest¬ 
ingly, lysine and arginine residues in this domain fre¬ 
quently occur adjacent to threonine and serine residues. 
That observation, coupled with the detection of high lev¬ 
els of threonine and TV-acetylgalactosamine in partially 
digested tryptic peptides (Rzepecki and Waite, 1993), 
leads to the hypothesis that Arg and Lys are protected 
from trypsin cleavage by adjacent glycosylated amino 
acids. A similar protection appears to be imparted by 
glycosylated residues in an extensin-like glycoprotein 
from Volvo.x carteri (Hit 1 et al, 1992). 

The N-terminal half of Dpfpl differs significantly from 
the C-terminal domain with its repeated 13 amino acid 
motif (Fig. 4). Previous peptide data (Rzepecki and Waite, 
1993) and the deduced sequence of Dpfpl are consistent 
with the hypothesis that glycosylation is more extensive 
in the N-terminal region of the protein, whereas hydroxyl- 
ation of tyrosine to Dopa occurs more frequently in the 
remaining C-terminal portion. Additionally, the average 
isoelectric point of Dpfpl in the region occupied by the 
heptapeptide is moderately basic (pi = 8.7), whereas the 
C-terminal domain is quite acidic (pi = 4.7). These diver¬ 
gent characteristics suggest that the segregation of motifs 
plays a significant role in the architectural design of the 
zebra mussel byssus. Recently, two byssal structural pro¬ 
teins from M. edulis have also been shown to be com¬ 
posed of “block copolymeri'-like domains. Both proteins 
have a central collagenous core flanked by sequences re¬ 
sembling either elastin (Coyne et al, 1997) or silk fibroin 
(Qin et al, 1997). The distribution of these proteins can 


ZEBRA MUSSEL BYSSAL ADHESIVE PROTEIN 


159 


A. Heptapeptide comparisons 

Protein _ Consensus Sequence _ Repeats Re f. 


Dpfpl 

P 

V 

Y 

P 

- T 

- K - 

X 

22 

Mefpl 

P 

S 

Y 

P 

P T 

Y K A 

K 

75 

Soybean PRP 

P 

V 

Y 

- 

. - - 

- K - 

P 

43 

Titin PEVK 

fit 

V 

- 

P 

- X 

" K 


27 


a) present data; b) Laursen (1992); c) Hong et al (1987); d) Labeit and Kolmerer (1995) 

B. Tridecapeptide comparison 

Protein _ Consensus Sequence _ Regents Ref. 

Dpfpl KPGPYDYDGPYDK 15 a 

Eggshell TRP Y-G-YDKYG-YDK 27 e 

a) present data; e) Wells and Cordingly (1992) 


Figure 5. Comparison of the heptapeptide (A) and tridecapeptide (B) motifs of Dpfpl with a variety 
of other repetitive proteins. Shaded residues highlight identities; gaps denoted by dashes (-) are included 
to maximize alignment. X denotes any amino acid residue. 


be used to account for the heterogeneous mechanical 
properties of byssus in M. edulis (Qin and Waite, 1995). 

Although the consensus motifs of Dpfpl do not have 
strong homologies with any known structural proteins, 
they do share some features with other proteins containing 
tandem repeats— i.e., marine adhesives (Laursen, 1992), 
extensin-like proteins from plants (Kieliszewski and Lam¬ 
port, 1994), and a trematode eggshell protein (Wells and 
Cordingley, 1992) (Fig. 5). The /3-turn (Pro Val) and 
lysine of the heptapeptide are prominent in extension 
(soybean PRP) (Hong et al., 1987) and adhesive protein 
(Waite et al., 1985). In addition, although not a repeating 
sequence, the PEVK domain of titin, a protein of skeletal 
muscle, contains at least 27 occurrences of the motif 
PVPX n K in which X n can be from one to three amino 
acids long (Labeit and Kolmerer. 1995). The tridecapep¬ 
tide of Dpfpl, in contrast, shares the repeated proximity 
of YD with a trematode eggshell protein (Wells and Cor¬ 
dingley, 1992), although the latter notably lacks proline 
(Fig. 5). Curiously, all these proteins have one thing in 
common: they are significant components of structures 
that function in tension. 
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